Simulation Study of Imbalanced Classification on High-Dimensional Gene Expression Data
نویسندگان
چکیده
Purpose: Classification of gene expression helps study disease. However, it faces two obstacles: an imbalanced class and a high dimension. The motivation this is to examine the effectiveness undersampling before feature selection on high-dimensional data with classes.Methods: Least Absolute Shrinkage Selection Operator (Lasso), which can select features, handle modeling. Random (RUS) be used deal classes. Decision Tree (CART) algorithm construct classification model because produce interpretable model. Thirty simulated datasets varying imbalance ratios are test proposed approaches, Lasso-CART RUS-Lasso-CART. generated from parameters real data.Results: simulation results show that when minority accounts for more than 25% observation size, method appropriate. Meanwhile, RUS-Lasso-CART effective size at least 20 observations.Novelty: novelty using hybrid address problem
منابع مشابه
Classification of High Dimensional and Imbalanced Hyperspectral Imagery Data
The present paper addresses the problem of the classification of hyperspectral images with multiple imbalanced classes and very high dimensionality. Class imbalance is handled by resampling the data set, whereas PCA is applied to reduce the number of spectral bands. This is a preliminary study that pursues to investigate the benefits of using together these two techniques, and also to evaluate ...
متن کاملOn Mining Fuzzy Classification Rules for Imbalanced Data
Fuzzy rule-based classification system (FRBCS) is a popular machine learning technique for classification purposes. One of the major issues when applying it on imbalanced data sets is its biased to the majority class, such that, it performs poorly in respect to the minority class. However many cases the minority classes are more important than the majority ones. In this paper, we have extended ...
متن کاملOn Mining Fuzzy Classification Rules for Imbalanced Data
Fuzzy rule-based classification system (FRBCS) is a popular machine learning technique for classification purposes. One of the major issues when applying it on imbalanced data sets is its biased to the majority class, such that, it performs poorly in respect to the minority class. However many cases the minority classes are more important than the majority ones. In this paper, we have extended ...
متن کاملClass-imbalanced classifiers for high-dimensional data
A class-imbalanced classifier is a decision rule to predict the class membership of new samples from an available data set where the class sizes differ considerably. When the class sizes are very different, most standard classification algorithms may favor the larger (majority) class resulting in poor accuracy in the minority class prediction. A class-imbalanced classifier typically modifies a ...
متن کاملSVM Classification for High-dimensional Imbalanced Data based on SNR and Under-sampling
Support vector machine (SVM) is biased towards the majority class, in some case dataset is class-imbalanced and the bias is even larger for high-dimensional. In order to improve the classification accuracy of SVM on high-dimensional imbalanced data, we combine signal-noise ratio (SNR) and under-sampling technique based on K-means. In this article firstly we apply SNR into feature selection to r...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
ژورنال
عنوان ژورنال: Scientific Journal of Informatics
سال: 2023
ISSN: ['2407-7658', '2460-0040']
DOI: https://doi.org/10.15294/sji.v10i1.40589